Audio-visual speech processing and analysis based on subspace projections
نویسندگان
چکیده
منابع مشابه
Audio-visual Speech Processing
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them in...
متن کاملAudio-Visual Speech Processing: Progress and Challenges
This keynote focuses on using visual channel information to improve automatic speech processing for human computer interaction. Two main issues are discussed: the extraction and representation of visual speech, as well as its fusion with traditional acoustic information. The talk mostly considers applying these techniques to automatic speech recognition, however additional areas of interest are...
متن کاملSome Experiments in Audio-Visual Speech Processing
Natural speech is produced by the vocal organs of a particular talker. The acoustic features of the speech signal must therefore be correlated with the movements of the articulators (lips, jaw, tongue, velum,...). For instance, hearing impaired people (and not only them) improve their understanding of speech by lip reading. This chapter is an overview of audiovisual speech processing with empha...
متن کاملAudio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature
As one of the techniques for robust speech recognition under noisy environment, audio-visual speech recognition using lip dynamic visual information together with audio information is attracting attention and the research is advanced in recent years. Since visual information plays a great role in audio-visual speech recognition, what to select as the visual feature becomes a significant point. ...
متن کاملSpeech extraction based on ICA and audio-visual coherence
We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the speaker’s face, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scientific and Technical Journal of Information Technologies, Mechanics and Optics
سال: 2018
ISSN: 2226-1494
DOI: 10.17586/2226-1494-2018-18-2-243-254